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Abstract 

We  consider  a  least  squares  inverse  problem  with  a  model  for  inducer-mediated  reactivation  of  latent 
viruses.  We  illustrate  the  difficulties  associated  with  the  lack  of  sufficient  data  (quantity  as  well  as  form) 
in  such  problems.  The  modeling  and  estimation  efforts  described  suggest  new  experiments  as  well  as 
needed  model  extensions. 
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1  Introduction 


Mathematical  and  statistical  inverse  problem  techniques  in  the  context  of  the  biological  sciences  are  becoming 
increasing  prevalent  and,  consequently,  of  increasing  importance.  Classical  estimation  theory  has  been 
primarily  developed  in  a  statistical  context  (based  on,  e.g.,  asymptotic  distribution,  hypothesis  testing  and 
Bayesian  approaches  that  involve  large  data  sets)  for  relatively  simple  models.  However,  recent  advances  in 
both  theory  and  computation  suggest  that  classical  inverse  problem  techniques  (regularized  least  squares, 
constrained  maximum  likelihood,  etc.)  with  increasingly  complex  models  will  play  a  significant  role  in  near 
term  future  biological  modeling  research  efforts.  Model  development,  validation  and  comparison  techniques 
[8]  such  as  the  Kullback-Leibler  Information  “distance”  between  models  (also  called  the  discrimination 
information)  [16,  17],  the  Akaike  Information  Criterion  [2,  3],  the  Takeuchi  Information  Criterion  [24],  various 
Likelihood  Ratio  Tests  [8]  and  AN OVA  type  Hypothesis  Testing  [4,  5]  should  see  significant  use  with  nonlinear 
dynamical  systems  models.  However,  since  all  of  theses  statistically  based  techniques  require  the  availability 
of  sufficiently  rich  data  sets,  they  will  have  limited  success  and  impact  in  scientific  advancement  unless  inverse 
problem  and  estimation  “experts”  are  working  in  significant,  close  collaborations  with  biologists  to  design 
experiments  in  the  context  of  modeling  efforts. 

In  this  paper  we  illustrate  some  of  the  inherent  difficulties  that  one  can  anticipate  in  trying  to  develop  and 
validate  a  reasonably  complex  dynamical  model  with  only  “literature”  data  (often  referred  to  as  “cold  data” 
by  inverse  problem  investigators  since  it  was  not  collected  with  model  development  in  mind) .  We  do  this  in 
the  context  of  model  development  of  a  biologically  important  system  for  reactivation  of  latent  virus  under  the 
constraint  of  limited  data  (no  direct  observations  of  any  of  the  seven  model  compartments,  a  limited  number 
of  longitudinal  observations  per  experiment,  observations  of  only  percentages,  i.e.,  ratios,  of  cell  counts). 
The  modeling  and  least  squares  efforts  lead  in  this  case  to  specific  suggestions  for  design  of  new  experiments 
and  the  type  of  data  needed.  In  this  example,  while  some  parameters  can  be  “estimated”  from  literature 
data  by  considering  limiting  (as  t  — >  0  or  t  — >  oo)  set  point  values,  the  longitudinal  data  available  consists  of 
“%  viable  cells”  in  terms  of  ratios  of  latent  plus  replicating  host  cells.  This  data  leads  to  very  difficult  least 
squares  or  maximum  likelihood  inverse  problems  for  dynamic  parameters  as  evidenced  by  relatively  large 
standard  errors  and  to  lack  of  ability  to  validate  viral  compartment  dynamics  in  the  model.  However,  as  we 
shall  see,  the  modeling  effort  leads  to  suggestions  for  detailed  refined  models  as  well  as  new  experiments. 


2  A  Mathematical  Model 


We  consider  a  mathematical  model  that  describes  the  reactivation  of  a  latent  virus  by  chemical  inducers  at 
the  cellular  level.  Here  we  first  give  the  differential  equations  that  model  the  dynamics  of  host  cells  and  viral 
DNA  without  deriving  the  model.  Interested  readers  can  find  the  details  and  a  derivation  of  the  model  in 
[15].  When  a  latent  cell  line  is  growing  in  a  nutritious  environment,  the  cell  line  is  considered  to  be  uninduced 
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and  in  the  cases  considered  in  [15],  can  be  described  by  the  nonlinear  ODEs 


dHL 

dt 


(7 l  -  a0)HL 


dHR 

dt 


a0HL  —  di(Vi)HR 


dHpj 

dt 

dL 

dt 

dR 

dt 

dVr 

dt 


dRHL  +  di(Vi)HR  —  pHN 
(7 l  -  a0 )L 

o.qL  —  ( dj{Vi )  —  n  +  b)  R 
bR-(p  +  dI(VI))VI, 


(1) 


and 


VF(t)  =  VF0  +  f  pVi(u)du. 
J 1 0 


The  compartmental  variables  and  the  parameters  for  this  model  are  defined  in  Table  1  and  Table  2,  respec¬ 
tively.  Let  s  denote  the  concentration  level  of  chemical  inducing  agents  and  define  o(s)  and  SR(s)  to  be  the 
viral  reactivation  rate  and  the  host  cell  death  rate  caused  by  chemical  inducers.  Then  the  host  cells  and  viral 
dynamics  during  reactivation  (induced)  as  derived  in  [15]  are  governed  by  the  following  differential  equations 


dHL 

dt 


(7 l  ^  a(s))  Hl 


dHR 

dt 


a(s)HL  —  ( di(Vi )  +  SR(s))  Hr 


dHjsr 

dt 

dL 

dt 

dR 

dt 

dVr 

dt 


dLHL  +  ( di(Vi )  +  SR(s))  Hr  -  pHN 
(7 l  ^  a(s))L 

a(s)L  —  ( di(Vj )  +  5r(s)  —  n  +  b)  R 
bR  -  (p  +  d/(Vr)  +  SR(s))  Vi 


(2) 


and 


VF(t)  =  VF0  +  [  pVi{u)du. 
Jt0 
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Compartment 

Symbol 

Units 

Host  cells  (latent  virus  only) 

hl 

number  of  cells 

Host  cells  (lytic  virus  only) 

Hr 

number  of  cells 

Nonviable  host  cells 

Hn 

number  of  cells 

Latent  virus 

L 

DNA  copies 

Lytic  virus 

R 

DNA  copies 

Intracellular  virus 

Vr 

DNA  copies 

Free  virus 

Vf 

number  of  virions 

Table  1:  ODE  Model  Compartments. 


Parameter  Symbol  Units 


Net  growth  of  latent  host  cells  'jl  hr”1 

Nonviable  cell  degradation  p,  hr-1 

Natural  death  of  latent  host  cells  g?l  hr-1 

Spontaneous  reactivation  of  latent  host  cells  au  hr- 1 

Cell  death  due  to  viral  lysis  c  hr-1 

Synthesis  of  viral  DNA  k  hr-1 

Sequestration  of  viral  DNA  for  encapsulation  b  hr-1 

Packaging  and  secretion  of  virions  p  hr”1 


Viral  DNA  per  lytic  host  cell  ut 


Table  2:  Uninduced  Parameter  Description. 
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3  Parameter  Estimation 


We  first  estimate  many  of  the  uninduced  model  parameters  from  experimental  observations  cited  in  the 
literature.  In  the  uninduced  cell  cultures,  experimentalists  normally  observe  the  fraction  as  of  lytic  host  cells, 
the  fraction  Nr  of  nonviable  host  cells,  host  cell  doubling  time  Dp  and  average  copies  ut  of  viral  DNA  per  cell. 
These  values  can  be  approximated  using  values  from  the  experimental  literature  [10,  14,  18,  19,  20,  21,  25,  27] 
and  they  mathematically  correspond  to 


( _ ^ _ ) 

\HL  +  HR  +  HN)t^  x 

(  L  +  R  +  V r  \ 

\HL  +  HR  +  HN)t^0 0 

It  follows  that  the  uninduced  parameters  are  related  by 

ln(2  )/Dp  +  fj,Nr 

dL  =  1 - m - 

1  —  as  —  J\r 


Us  {hr  +  Hr  +  hJ)^ 

2  _  {Hr  +  Hr  +  HN)t=Dp 

{Hr  +  Hr  +  H]\r)t= 0 


(3) 


c 


ln(2) 

^sVlA^p 


{Nr 


1)  +  — (1  -  as  -  Nr) 

asViA 


ao 


1L  - 


M2) 

Dp 


Ra  (1  —  as  —  Nr)  (  ln(2)  \ 

TV, A  as  \rL  Dp  ) 


(4) 


^7L-ln(2  )/Dp\ 

k  =  - - -  {Ra  -  NrRA  -  nT  +  asVIA )  +  b 

\  asRA  J 


nr  —  cis{Ra  +  Via) 

n  =  - i - m - > 

1  —  as  —  l\r 

where  n  =  L/Hr  denotes  the  average  number  of  latent  viral  DNA  copies  per  latently  infected  host  cell 
which  is  assumed  to  be  constant  for  the  model  derivation  in  [15] .  Here  Ra  and  V, a  denote  the  average  viral 
replicating  DNA,  respectively,  and  viral  intracellular  DNA  and  are  defined  by 


Via  = 


(5) 


Although  the  values  of  Ra  and  Via  are  the  only  two  values  in  the  uninduced  model  parameters  that  cannot 
be  obtained  from  the  literature;  they  are  chosen  for  our  simulations  so  that  (4)  holds  and  all  the  parameters 
involved  are  positive. 


We  next  formulate  an  inverse  problem  for  the  induced  model  to  obtain  the  viral  reactivation  rate  a(s)  and 
the  host  cell  death  rate  5r{s)  by  the  chemical  inducers  using  cell  viability  data.  We  use  two  independent 
sets  of  experimental  data  from  the  literature  [26,  27]  wherein  the  data  are  given  in  percentage  of  viable  cells 
at  different  inducer  (butyrate)  concentration  and  are  collected  every  24  hours  over  a  maximum  period  of  five 
days. 
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We  considered  the  special  case  with  parametric  functional  forms  5r(s)  =  dcs,  a(s)  =  acs  +  ao  and  let 
q  =  (Sc,ac)  and  x  =  (HL,  HRl  HN,  L,  R,Vj,Vp)T .  Then  the  differential  equations  in  the  model  for  the 
induced  case  (2)  can  be  written  in  a  general  form 


x  =  g(t,x,s,q)  (6) 

z(0)  =  x0, 

where  g  :  M+  x  Kr  x  K+  x  Km  — >  Mr  for  r  =  7,  m  =  2,  and  x$  =  (Hlq,  Hr0,  Hn0,  L0,  R0,  Vj0,  Vfo)T  ■  To 
correspond  with  the  experimental  data  given  in  percentage  of  viable  cells,  we  define  the  outputs  of  the  model 


Htotaijt ,  s,  q )  -  HN(t,  s,  g) 
Htotal  (^;  q) 


t,s>  0, 


where  Htotai  =  Hl+Hr+Hn.  In  each  least  squares  parameter  fit  to  the  data,  the  data  is  longitudinal  (taken 
at  tk)  and  across  several  levels  s,;  of  inducer.  This  is  indexed  by  t3  =  (tk,  Si )  for  k  =  1, . . . ,  K,  i  =  1, . . . ,  I, 
and  observations  y3  for  the  model  values  fj(q)  =  f(tk,Si,q),  j  =  1  =  KI.  Then  we  construct  the 

ordinary  least  square  (OLS)  inverse  problem  by  seeking  q  that  minimize  the  cost  criterion 


N 


J(9)  =  -  fM)\2> 

3= 1 


(7) 


where  {yj}  denotes  the  experimental  data. 


Once  the  optimal  q  are  found  using  the  Nelder-Mead  algorithm,  standard  errors  and  confidence  intervals  are 
computed  by  using  the  asymptotic  theory  which  we  proceed  to  outline.  Assume  N  scalar  longitudinal/inducer 
level  observations  (time/inducer  series  of  numbers  or  ratios  of  numbers  of  cells  as  described  below)  are 
represented  by  the  statistical  model 

Yj=  fj(q0)  + ejy  j  =  1,2,...  N,  (8) 

where  fj(qo)  is  the  model  for  the  observations  in  terms  of  the  state  variables  and  qo  £  Rm  is  a  “set”  of 
theoretical  “true”  parameter  values  (assumed  to  exist  in  a  standard  statistical  approach).  Assume  further 
that  the  errors  t:n  j  =  1,2,  ...,IV  in  the  statistical  model  of  the  observation  or  measurement  process  (8) 
are  independent  identically  distributed  ( i.i.d .)  random  variables  with  mean  E[ej\  =  0  and  constant  variance 
var[ej\  =  (7q  where  of  course  Cg  is  unknown.  The  constant  variance  assumption  can  be  validated  by  use  of 
standard  residual  plots  with  the  data  used  in  our  inverse  problems.  It  follows  that  the  observations  Yj  are 
i.i.d.  with  mean  E[Yj]  =  fj(qo)  and  variance  var\Yj\  =  «Tq. 

Using  the  data  {y3}  for  the  observation  process  {Yj}  with  the  model,  J(q)  is  optimized  by  finding  the  OLS 
estimator  q  in  (7).  Note  that  the  estimator  goLS  is  also  a  random  variable  with  a  distribution  called  the 
sampling  distribution  because  Yj  is  a  random  variable.  Knowledge  of  this  sampling  distribution  provides 
uncertainty  information  (e.g.,  standard  errors)  for  the  numerical  values  of  q  obtained  using  a  specific  data 
set  {yj}  (i.e.,  a  realization  of  {Yj})  when  minimizing  J(q). 

Under  reasonable  assumptions  on  smoothness  and  regularity  (the  smoothness  requirements  for  model  so¬ 
lutions  are  readily  verified  using  continuous  dependence  results  for  ordinary  differential  equations  in  our 
example;  the  regularity  requirements  involve,  among  others,  conditions  on  how  the  observations  are  taken  as 
sample  size  increases,  i.e.,  as  N  — ►  oo),  the  standard  nonlinear  regression  approximation  theory  ([11],  [12], 
[13],  and  Chapter  12  of  [22])  for  asymptotic  (as  N  — >  oo)  distributions  can  be  invoked.  This  theory  yields 
that  the  sampling  distribution  for  the  estimate  q(Y)  =  5ols(^)i  where  Y  =  {Yj}^=1,  is  approximately  a  m- 
multivariate  Gaussian  with  mean  E[q(Y)]  «  q0  and  covariance  matrix  cov[q(Y)]  ss  S0  =  o’q[xT (qo)x((lo)]~1  ■ 
Here  x(g)  =  Fq(q)  is  the  N  x  m  sensitivity  matrix  with  elements 

Xjk{q)  =  ^  and  Fq(q)  =  (flq{q), ...,  fNq{q))T- 
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That  is,  for  N  large,  the  sampling  distribution  approximately  satisfies 


qoLs(y )  ^■A/'m(go,CTo[xT(go)x(9o)]  X)  :=  A^m(g0, So)- 


(9) 


The  elements  of  the  matrix  \  =  (Xjk)  can  be  estimated  using  the  forward  difference 

v  (ns  9fj(q)  _  fj(q  +  hk)- fj{q) 

xik{q)  =  - ra — • 


where  hk  is  an  m- vector  with  nonzero  entry  in  only  the  kth  component,  or  using  sensitivity  equations  (see  [7] 
and  the  references  therein).  Here  we  chose  the  sensitivity  equation  approach  and  since  g0,  er0  are  not  known, 
we  approximate  them  in  Eo  =  0o[xT(<7o)x(dlo)]_1-  Following  standard  practice,  Eo  is  approximated  by 

E0«S(g)  =  d2[xT(g)x(9)]-1 


dF 

where  q  is  the  parameter  estimate  obtained  from  minimizing  (7)  and  x(?)  =  (g).  ^rom  the  outputs  defined 

dx 

in  (6),  it  suffices  to  compute  the  sensitivities  —  by  solving  the  r  x  m  matrix  linear  variational  differential 

dq 

equation  (called  the  the  sensitivity  equation  in  the  applied  mathematics  and  engineering  literature) 


d  ( cbc\  dg  dx  dg 
dt  \dq)  dx  dq  +  dq' 


(10) 


The  matrix  coefficient  and  the  forcing  function  in  this  equation  are  evaluated  along  solutions  of  the  system 
equation  (6).  The  approximation  a2  to  a 2  is  given  by 


a 


2 

o 


1 


N  —  TO 


N 


-  fj(q)\2- 

3=  i 


Standard  errors  to  be  used  in  confidence  interval  calculations  are  thus  given  by  SEk(q)  =  ^E kk(q),  k  = 
1, 2, . . . ,  m  (see  [9]). 

Finally,  in  order  to  compute  the  confidence  intervals  (at  the  100(1  —  c)%  level)  for  the  estimated  parameters 
Sc  and  ac,  we  define  the  confidence  level  parameters  associated  with  the  estimated  parameters  so  that 


P{dk  -  tc/2SEk(q)  <  qk  <  qk  +  tc/2SEk{q))  =  1  -  c,  (11) 

where  c  €  [0,1],  and  tc/2  €  R+.  For  a  given  c  value  (small,  say  c  =  .05  for  95%  confidence  intervals),  the 
critical  value  tc/2  is  computed  from  the  Student’s  t  distribution  tN~m  with  N—m  degrees  of  freedom  since  for 
each  of  the  data  sets  available  to  us  we  have  N  <  30.  The  value  of  tc/2  is  determined  by  P[T  >  tc/2 )  =  c/2 
where  T  ~  tN~m. 


4  Numerical  Results 


In  the  uninduced  experiments,  Zoeteweij’s  group  observed  7.7%  nonviable  cells  while  Yu,  et  al. ,  reported  16% 
nonviable  cells  in  their  data.  Therefore,  we  let  Nr  =  0.077  and  0.16,  ut  =  69  and  68  to  correspond  with  the 
data  of  Zoeteweij,  et  al.,  and  Yu,  et  al.,  respectively.  Furthermore,  the  fraction  (percentage)  of  spontaneous 
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lytic  host  cells  as  and  host  cell  doubling  time  Dp  are  chosen  to  be  within  the  ranges  reported  in  the  literature, 

1. e.,  as  =  0.02  and  Dp  =  24  hours.  In  Figure  1,  we  depict  the  percentage  of  nonviable  cells  and  spontaneously 
reactivated  host  cells  from  the  simulation  of  the  uninduced  model  case  (1)  for  the  Zoeteweij,  et  al. ,  and  the 
Yu,  et  al .,  experimental  data.  The  initial  condition  for  all  compartments  are  zero  except  for  Hr o  =  1.0  x  106 
and  Lq  =  1.0  x  10'.  The  values  of  the  parameters  used  in  the  simulations  presented  here  are  tabulated  in 
Table  3.  From  Figure  1,  we  observe  that  the  nonviable  cell  and  spontaneously  reactivated  cell  percentages 
asymptotically  reach  the  specified  “equilibrium”  values  by  1000  hours  where  we  define  “equilibrium”  to  be 
the  constants  in  (3)  and  (5).  Even  though  the  cell  culture  is  growing  exponentially,  the  characteristics  of  the 
cell  culture  eventually  become  those  constants  in  (3)  and  (5).  The  simulations  of  the  average  number  of  lytic 
and  intracellular  DNA  per  lytically  infected  host  cell,  R/H r  and  Vi/Hr,  respectively  are  shown  in  Figure 

2.  Similarly,  we  see  both  R/Hr  and  Vi/Hr  converge  to  the  specified  equilibrium  values  of  Ra  =  2500  and 
Via  =  500,  respectively  by  1000  hours. 

The  equilibrated  simulations  and  the  estimated  parameters  in  Table  3  of  the  uninduced  model  case  are 
assumed  to  be  the  properties  of  the  uninduced  (control)  cell  cultures  that  are  used  in  the  reactivation 
experiments.  In  the  induced  model  case,  let  Cg  be  the  initial  number  of  cells;  then  the  initial  conditions 
are  set  to  be  HL{ 0)  =  (1  -  Nr  -  as)C0 ,  HR{ 0)  =  asC0 ,  HN( 0)  =  NrC0,  L{ 0)  =  nHL,  R( 0)  =  RaHr , 
V/(0)  =  ViaHr ,  and  Ff(0)  =  0.  As  mentioned  above,  the  functional  form  of  the  induced  lytic  cell  death 
rate  Sr(s)  and  reactivation  rate  a(s)  are  assumed  to  be  affine  functions  of  the  form  Sr(s)  =  Scs  and 
a(s)  =  acs  +  ag  as  a  first  approximation  to  the  “true”  forms  which  are  not  known.  The  optimal  induced 
parameters  q  =  [<5c,dc]  are  obtained  by  fitting  the  two  sets  of  experimental  data  from  Zoeteweij,  et  al., 
[27]  and  Yu,  et  al.,  [26]  independently  in  minimizing  (7).  The  optimized  induced  model  parameters  are 
tabulated  in  Table  4  along  with  the  standard  errors  and  confidence  intervals  which  are  calculated  from  the 
mathematical  and  statistical  method  presented  in  Section  3.  We  hasten  to  caution  that  here  we  had  at 
most  iV  =  16  observations  when  estimating  the  two  parameters  ( Sc ,  ac)  and  thus  the  following  remarks 
result  from  using  asymptotic  distributions  when  it  may  be  unwarranted.  Nonetheless,  from  Table  4,  we 
see  that  the  values  of  the  estimated  parameters  of  both  groups  are  within  an  order  of  magnitude  of  each 
other.  In  addition,  for  both  experimental  data  sets,  the  standard  errors  of  the  estimated  reactivation  rate 
constants  dc  are  at  least  one  order  of  magnitude  less  than  the  parameter  values  while  the  standard  errors  of 
the  induced  death  rate  constants  Sc  have  the  same  order  of  magnitude  as  the  parameter  values.  Thus,  we 
might  argue  for  more  confidence  in  the  values  obtained  for  ac  than  values  obtained  for  Sc.  (But  this  could 
also  be  a  result  of  using  asymptotic  analysis  when  insufficient  data  has  been  used.)  The  estimated  induced 
parameters  q  =  [<5C,  ac\  are  then  used  to  generate  the  viable  cell  percentage  with  (2)  and  then  plotted  against 
the  experimental  data  from  Zoeteweij,  et  al.,  and  Yu,  et  al,  in  Figure  3.  It  can  be  seen  from  Figure  3  that 
the  induced  simulations  qualitatively  agree  with  both  sets  of  experimental  data. 

The  number  of  free  virions  (Vf)  simulated  from  the  induced  equation  (2)  with  optimized  parameters  q  are 
plotted  in  Figure  4.  From  the  figure,  we  observe  that  there  is  a  three  to  four-fold  increase  in  free  virions 
produced  at  0.3mM  butyrate  level  compared  to  3mM  butyrate  concentration  level.  This  observation  in  the 
simulations  agree  qualitatively  with  the  experiments  from  Yu,  et  al.  In  [26],  Yu  and  his  group  report  that 
with  high  butyrate  concentrations  (1.5  and  3  mM),  there  is  a  great  increase  in  lytic  activity  but  also  a 
significant  increase  in  cell  death.  Therefore,  very  few  free  virions  are  produced  in  the  experiments  due  to 
massive  cell  death  before  the  completion  of  the  lytic  program.  However,  with  a  lower  concentration  level  of 
butyrate  (<  0.3  mM)  they  observe  much  less  cell  death  and  a  significant  secretion  of  free  virions.  The  optimal 
butyrate  dosage  threshold  that  maximizes  viral  production  is  numerically  computed  from  equation  (2)  and 
depicted  in  Figure  5  for  the  two  data  sets.  From  Figure  5  we  interpret  that  the  butyrate  concentrations  below 
the  optimal  threshold  suggest  that  the  reactivation  activity  of  latent  virus  by  the  inducers  is  not  maximized. 
On  the  other  hand,  the  butyrate  concentrations  above  the  threshold  imply  that  the  concentration  of  the 
inducers  is  too  toxic  and  the  lytically  replicating  cells  are  killed  before  virus  is  produced. 
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Figure  1:  Uninduced  simulations  of  spontaneously  reactivated  cell  (100  x  Hn/(Htota.i))  and  nonviable  cell 
percentage  (100  x  Hn / {Htot&i))  for  the  data  of  Zoeteweij  (dashed  lines),  et  al. ,  and  Yu  (solid  lines),  et  al. 


Parameter  Symbol  Zoeteweij,  et  al .,  data  Yu,  et  al.,  data  Units 


Net  growth  of  latent  host  cells  jL 


Nonviable  cell  degradation  /i 

Natural  death  of  latent  host  cells 
Spontaneous  reactivation  of  latent  host  cells  ao 

Cell  death  due  to  viral  lysis  c 

Synthesis  of  viral  DNA  k 

Sequestration  of  viral  DNA  for  encapsulation  q 

Packaging  and  secretion  of  virions  p 

Viral  DNA  per  lytic  host  cell  ut 

Induced  reactivation  ac 

Induced  death  Sc 


3.00 

X 

10"2 

3.00 

X 

10"2 

hr-1 

1.00 

X 

10"5 

1.00 

X 

10”5 

hr-1 

1.98 

X 

10"3 

5.22 

X 

10"3 

hr-1 

1.12 

X 

10"3 

1.12 

X 

10"3 

hr’1 

4.33 

X 

10"5 

3.40 

X 

10"5 

hr”1 

7.11 

X 

10"2 

6.65 

X 

10"2 

hr”1 

2.08 

X 

10"2 

2.08 

X 

10"2 

hr’1 

5.36 

X 

10"2 

5.83 

X 

10"2 

hr-1 

69 

68 

- 

5.51 

X 

10_1 

1.40 

X 

10-1 

hr’1 

5.13 

X 

10"3 

6.76 

X 

10"3 

hr"1 

Table  3:  Parameters  from  the  uninduced  model  (1)  are  calculated  from  (4)  with  constants  as  =  0.02, 
Nr  =  0.077  or  0.16,  Dp  =  24  hr,  n  =  10,  Via  =  500,  and  Ra  =  2500.  Parameters  from  the  induced  model 
(2)  are  obtained  from  fits  to  experimental  data. 
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TIME  (HR) 


Figure  2:  Uninduced  simulations  of  the  average  number  of  lytic  viral  DNA  copies  R/Hr  and  the  average 
number  of  intracellular  viral  DNA  copies  Vi/Hr  per  lytically  infected  host  cell  for  the  data  of  Zoeteweij 
(dashed  lines),  et  al. ,  and  Yu  (solid  lines),  et  al. 
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Figure  3:  Cell  viability  experimental  measurements  (symbols)  and  induced  model  (2)  simulations  (solid  lines) 
using  fitted  parameters  for  linear  functions  a(s)  and  6r(s):  (a)  Zoeteweij,  et  al. ,  circles  (0  mM),  squares 
(0.03  mM),  triangles  (0.3  mM),  and  diamonds  (3  mM),  ac  =  0.551,  6C  =  5.13  x  10-3  and  (b)  Yu,  et  al., 
circles  (0  mM),  triangles  (0.3  mM),  stars  (1.5  mM),  and  diamonds  (3  mM),  ac  =  0.140,  5C  =  6.76  x  10-3. 
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VIRIONS 


TIME  (HR) 


(a) 


TIME  (HR) 


(b) 

Figure  4:  Induced  model  (2)  simulations  of  free  virions  using  optimized  parameters  for  linear  functions  a(s) 
and  Sr(s)  for  experimental  data  from  (a)  Zoeteweij,  et  al and  (b)  Yu,  et  al. 
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TIME  (HR) 


Figure  5:  The  optimal  butyrate  concentration  to  maximize  the  quantity  of  virions  produced  as  a  function  of 
the  elapsed  induction  time  for  data  from  Zoeteweij  (dashed  line),  et  al.,  and  Yu  (solid  line),  et  al. 


Data 

Parameter 

Estimated  Value 

Standard  Error 

Confidence  Interval 

Zoeteweij,  et  al. 

OLc 

5.51  x  KT1 

1.03  x  10~2 

[5.29  x  10_1, 5.73  x  10"1] 

K 

5.13  x  KT3 

3.25  x  10"3 

[-1.83  x  10-3, 1.21  x  10^2] 

Yu,  et  al. 

Olc 

1.40  x  10"1 

7.84  x  10"3 

[1.23  x  10_1, 1.57  x  10"1] 

K 

6.76  x  10~3 

2.88  x  10"3 

[5.30  x  10-4, 1.30  x  10"2] 

Table  4:  Estimated  parameter  values,  standard  errors,  and  confidence  intervals 
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5  Discussion  and  Concluding  Remarks 


After  carrying  out  the  inverse  problem  with  this  preliminary  model,  numerical  simulations  using  the  resulting 
estimated  parameters  provide  good  qualitative  fits  of  cell  viability  for  two  independent  data  sets  of  KSHV 
being  induced  by  butyrate.  However,  our  findings  also  strongly  motivate  the  need  for  more  longitudinal 
data  to  support  model  development  and  validation.  With  the  cell  viability  data  now  available,  we  can  only 
directly  validate  the  host  cells  dynamics  and  indirectly  justify  the  dynamics  of  the  viral  DNA  in  this  model. 
Specifically,  quantitative  experimental  measurements  of  cell-associated  DNA  (L  +  R  +  Vj)  and  free  virions 
(Vf)  are  needed  to  evaluate  model  predictions  for  the  viral  compartments.  We  note  that  the  experimental 
data  we  require  are  not  the  typical  data  collected  in  experiments.  In  other  words,  typical  experimental 
data  are  relatively  qualitative  data  recorded  at  one  time  point  while  quantitative  longitudinal  data  are 
needed  to  verify  the  host  cell  and  viral  dynamics  presented  in  this  model.  Indeed,  in  almost  all  efforts  with 
dynamical  models,  longitudinal  data  is  essential.  These  time  series  data  would  also  permit  determination 
(i.e.,  estimation)  of  the  free  parameters  7 1  and  g,,  as  well  as  the  unknown  constants  Ra  and  Via-  We  also 
note  that  Ra  and  Via  parameter  sensitivity  tests  reveal  that  the  optimal  parameter  values  Sc  and  ac  are 
relatively  insensitive  to  variations  in  Ra  or  Via,  since  varying  Ra  or  Via  by  ±  5%  produced  3%  or  less 
variation  in  the  optimized  parameter  values  from  the  corresponding  inverse  problems. 

In  addition  to  the  fits  of  the  linear  form  for  a(s)  and  Sr(s),  we  also  fitted  the  model  in  OLS  problems  where 
a(s)  and  6r(s)  are  allowed  to  be  of  Michaelis-Menton  form  and/or  sigmoid  shape  functions.  In  Figure  3 
from  the  previous  section,  we  see  that  the  fits  to  cell  viability  are  reasonable  with  the  simple  linear  functions 
for  a(s)  and  Sr(s).  We  also  found  that  the  fits  of  the  induced  equations  to  cell  viability  data  were  relatively 
insensitive  to  more  complicated  functional  forms  of  a(s)  and  5r(s)  (the  results  are  not  shown  here).  In  the 
future,  instead  of  estimating  a(s)  and  5r(s)  with  some  a  priori  parameterizations,  we  could  estimate  shape 
of  the  functional  form  using  piece-wise  linear  splines  or  other  approximations  as  has  been  successfully  done 
in  other  problems  with  temporally  varying  parameters  and  dynamic  coefficients  [1,  5].  However,  to  validate 
the  improvement  provided  by  such  models  with  sophisticated  model  comparison  techniques,  richer  data  sets 
are  again  needed. 

With  this  preliminary  mathematical  model  as  an  initial  study  of  the  reactivating  mechanism  of  latent  viruses 
by  chemical  inducers,  there  are  rather  obvious  modifications  we  would  like  to  pursue  in  developing  future 
generations  of  the  model.  First,  instead  of  assuming  a  constant  (net)  growth  rate  for  the  host  cells,  a 
nonconstant  assumption  on  the  host  cells  (net)  growth  rate  should  be  added  especially  after  48  hours.  Also, 
the  initial  model  presented  here  is  based  on  the  assumption  that  the  host  cells  are  of  “all  or  nothing”  type. 
That  is,  a  given  host  cell  either  has  all  latent  viral  DNA  (Hi)  or  all  lytic  replicating  DNA  (Hr)  in  the 
nucleus.  A  more  realistic  situation  can  be  illustrated  by  superimposing  a  probability  distribution  on  the 
parameters  to  better  approximate  mixed  conditions  where  a  host  cell  may  contain  both  latent  and  lytic 
virus  in  varying  levels.  Such  a  modeling  technique  was  successfully  used  in  [6]  for  cellular  level  HIV  models 
to  account  for  variable  length  (with  uncertainty)  pathways.  In  models  of  this  type  the  state  variables  are 
the  expected  values  of  concentrations  (or  of  numbers  of  cells)  resulting  in  delay  differential  equation  models 
embodying  uncertainty  through  the  explicit  dependence  of  dynamics  on  probability  distributions.  Once 
again,  richer  data  sets  are  essential  to  establish  validity  for  such  models. 

Finally,  instead  of  a  single  viral  compartment  R  to  quantify  copies  of  viral  DNA  in  the  lytic  program,  we 
suggest  that  one  might  modify  the  model  to  describe  Immediate  Early,  Early,  and  Late  gene  expression 
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(RNA),  represented  by  compartments  i?i,  R2l  and  R3  for  an  induced  model  in  the  form 


dHL 

dt 


dHR 

dt 


dH^ 

dt 

dL 

dt 

dR\ 

dt 


(7 l  -  ol{s)  -  SL{s))  Hl  +  pHR 

(■ 1r  ~  Sr(s)  —  di(Vi)  -  p )  Hr  +  a(s)HL 

( dL  +  Sl(s))Hl  +  (dR  +  SR(s)  +  dj(Vj))  HR  —  pHN 

(7 l  -  a(s)  -  SL{s))  L  +  p(Ri  +  i?2  +  R3) 

(lR  -bi-  SR(s)  -  di(Vi)  -  p)  Ri  +  a(s)L 


(12) 


=  (k  +  1r  ~  b2  ~  SR(s)  -  di(Vi)  -  p)  R2  +  biRi 
=  (7 R  ~  ^3  —  dR{s)  —  di(Vi )  —  p)  R3  +  b2R2 

=  b3R3  —  (p+  dR  +  dj(Vi)  +  5r(s))  Vj 

and  VR(t)  =  Vfq  +  /  pVi(u)du.  The  three  parameters  b3,  b2,  and  b3  represent  the  rate  at  which  viral  DNA 
Jto 

moves  from  one  stage  of  the  lytic  program  to  the  next.  These  parameters  can  be  estimated  as  1/Ti,  I/T2, 
and  I/T3,  respectively,  where  T\,  T2:  and  T3  are  the  approximate  times  for  each  stage  of  the  lytic  program. 
Again,  one  would  expect  variability  in  these  times  across  cell  populations,  suggesting  a  desired  introduction 
of  probability  distributions  to  be  estimated  instead  of  the  times  themselves.  Corresponding  parameters  in 
this  proposed  model  and  model  (2)  would  not,  of  course,  necessarily  represent  the  same  quantities. 


dR2 

dt 

dR3 

dt 

dV, 

dt 


By  having  model  compartments  that  quantify  RNA  production  or  promoter  activity  from  genes  representative 
of  each  stage  of  the  lytic  cycle,  one  could  hope  to  predict  viral  reactivation  in  more  detail  and  compare  to 
experimental  gene  expression  data.  For  example,  ORF50,  vIL6,  K8.1  [23]  could  be  representative  of  the 
Immediate  Early,  Early,  and  Late  stages,  respectively.  A  single  compartment  L  can  represent  latent  gene 
expression,  primarily  ORF73  expression  [23]. 


There  may  be  underlying  biological  delays,  due  to  the  ordered  cascade  of  gene  expression  that  makes  up 
the  lytic  program,  that  are  not  captured  with  the  model  (2).  A  model  such  as  (12)  in  which  we  rewrite  the 
single  R  compartment  as  three  compartments  R\:  R2 ,  and  R3  representing  the  Immediate  Early,  Early,  and 
Late  phases  of  the  lytic  program  might  be  expected  to  more  closely  capture  the  biological  delays  (or  the 
associated  probability  distributions)  inherent  in  the  lytic  program  of  the  system.  To  enable  development 
and  validation  of  such  models,  experiments  involving  multi-compartment  longitudinal  observations  will  be 
required. 


In  summary,  the  results  reported  on  here  illustrate  well  a  common  finding  in  initial  modeling  efforts  for 
biological  systems:  the  need  for  more  longitudinal  data  and  the  need  for  different  types  of  observations  than 
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are  prevalent  in  current  experiments  (such  as  those  involving  reactivation  of  latent  viruses) .  Regarding  the 
initial  modeling  aspects  of  this  project,  we  recall  that  most  modeling  efforts  (including  this  one)  are  steps 
in  an  iterative  process  in  which  one  takes  experimental  observations  and  forms  statistical  and  mathematical 
models.  These  models,  when  used  in  inverse  problems,  suggest  inadequacies  in  both  modeling  and  the 
experimental  data  collected.  This  leads  to  model  modification  and  extension  as  well  as  new  experiments  to 
collect  data  necessary  to  validate  the  new  model. 
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